Conditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker

نویسندگان

  • Yih-Ru Wang
  • Yuan-Fu Liao
  • Yeh-Kuang Wu
  • Liang-Chun Chang
چکیده

This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2013 evaluation. The main idea is to exchange potential error character with its confusable ones and rescore the modified sentence using a conditional random field (CRF)-based word segmentation/part of speech (POS) tagger and a tri-gram language model (LM) to detect and correct possible spelling errors. Experimental results on the Bakeoff 2013 tasks showed the proposed method achieved 0.50 location detection and 0.24 error location F-scores in subtask1 and 0.49 location and 0.40 correction accuracies and 0.40 correction precision in subtask2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Vector/Conditional Random Field-based Chinese Spelling Error Detection for SIGHAN-2015 Evaluation

In order to detect Chinese spelling errors, especially for essays written by foreign learners, a word vector/conditional random field (CRF)based detector is proposed in this paper. The main idea is to project each word in a test sentence into a high dimensional vector space in order to reveal and examine their relationships by using a CRF. The results are then utilized to constrain the time-con...

متن کامل

NCTU and NTUT's Entry to CLP-2014 Chinese Spelling Check Evaluation

This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2014 evaluation. The system’s main components are still the conditional random field (CRF)-based word segmentation/part-ofspeech (POS) tagger and tri-gram language model (LM) used last year. But we tried to refine the misspelling rules, decision-making threshold and improve LM rescoring speed to reduce false ala...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Introduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff

Chinese spelling check (CSC) is an essential issue in the research field of Chinese language processing (CLP). This paper describes the details of two CSC systems we developed to solve this problem. The first system was built based on CRF model, and the modules of such system include word segmentation, error detection and error correction. Another system was based on 2Chars&&3-Chars model, and ...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013